## Radix-2 FFT Routine Derived from CORDIC Algorithm For Surface Electromyography

Reshma.R, Anju MI

Abstract: FFT, is one of the most important operations in modern digital signal Processing and communication systems. It is one of the most used algorithms for calculating the Discrete Fourier Transform due to its efficiency in reducing computation time. Fast Fourier Transform (FFT) is used in a wide range of applications, such Orthogonal Frequency Division Multiplexing (OFDM) principle based mobile communication. Electromyography popularly known as EMG is a medical procedure that evaluates the health condition of muscles and the nerve cells that control them. These nerve cells are known as motor neurons. Each nerve cell transmits electrical signals that cause muscles to contract and relax. EMG translates these signals into graphs or numbers, helping doctors to make a efficient diagnosis. This project is about calculating muscle fatigue using EMG. The electric signal transmitted by the muscles will have lot of noises. This project proposes an ideal FFT Core implemented using CORDIC algorithm as a solution for the same.

Keywords: Coordinate Rotation Digital Computer, Discrete Fourier Transform, Electromyography, Fast Fourier Transforms, Invasive Electromyography,Orthogonal Frequency Division Multiplexing, Surface Electromyography.

#### **1** Introduction

Electromyography (EMG) is a medical procedure that evaluates the health condition of muscles and the nerve cells that control them. These nerve cells are known as motor neurons. These motor neurons transmit electrical signals that cause muscles to contract and relax. An EMG translates these electric signals sent from motor neurons into graphs or numbers generally 1 dimensional , helping doctors to make a better diagnosis.

The FFT is one of the most widely used algorithms for calculating the Discrete Fourier Transform (DFT) as its efficiency in reducing computation time. Fast Fourier Transform (FFT) has been used in a wide range of applications where the system implementation is only feasible when the equipment complexity and power consumption are greatly reduced.

Surface EMG is recorded using non invasive electrodes and intramuscular EMG signals are recorded by invasive electrodes-EMG is more widely used than the invasive counterpart. Electromyography (EMG) signals are used for physiotherapy as they are basically electrophysiological signals and hence used in both medical and engineering fields.

Whenever we record an EMG signal it will always have noise component. Hence, analyzing and classifying the EMG signals is very difficult especially when we are moving. To overcome this difficulty the signal needs to be filtered using DFT before sent to processing, the most efficient method for the same is Fast Fourier Transform.

#### 2 Literature survey

In the paper A New Approach to Pipeline FFT Processor,Shousheng He and Mats Torkelson Department of Applied Electronics, Lund University, S-22100 Lund, SWEDEN, it explains the ideal radix-2 algorithm derived by a twiddle factor decomposition technique. This helps in minimal hardware but For length N- DFT computations the hardware optimization is not ideal. In the paper

A pipelined architecture for normal I/O order FF,Xue LIU<sup>†</sup>, Feng YU<sup>†</sup><sup>‡</sup>, Ze-ke WANG,(Department of Instrument Engineering, Zhejiang University, Hangzhou 310027, China).this explains an efficient combined singlepath delay commutator-feedback (SDC-SDF). However better efficiency achieved in higher frequency than in lower frequency. In lower frequency hardware utilization is lesser tan the proposed.

In the paper E.H. Wold and A.M. Despain. Pipeline and parallel-pipeline FFT processors for VLSI implementation.Special attention is given in Proposition for implementing high performance FFT using VLSI but VLSI implementations have constraints which differ from those of discrete implementations, requiring another look at the commonly available FFT' algorithms in the light of these constraints.

The paper Low Power Hardware Implementation of High Speed FFT Core by author M. Kannan and S.K. Srivatsa Department of Electronics Engineering, Madras Institute of Technology Anna University, Chennai-44, India explains the parallel pipe lined architecture of the processor also has higher and it has throughput with lowered power consumption. However higher hardware requirement.

IJSER © 2019 http://www.ijser.org In the paper The Development of High Performance FFT IP Cores throughHybrid Low Power Algorithmic Methodology, Wei Hun, A. 7: Erdogan, 7: Arslun, und M. Hasan School of Engineering and Electronics University of Edinburgh, Edinburgh EH9 3JL, Scotland, UK. Huge importance is given for Low power consumption can be gained through the combination of hybrid low power algorithms and architectures, however improvement is needed in Comparative study conducted for different FFT processor in ideal condition real time results varies.

In the paper Pipeline FFT Architectures Optimized for FPGAs, Bin Zhou,1,2 Yingning Peng,1 and David Hwang2, 1Department of Electronic Engineering, Tsinghua University, Beijing 100084, China. This explains efficient mapping of the (SDF) fast Fourier transform (FFT) architecture to field-programmable gate arrays (FPGAs) is proposed .Improvement is required as there is no significant explanation on architectural or algorithmic choices for increasing the throughput.

#### 3 Proposed methodology

The computation of the  $N = 2^v$  point DFT is as follows. We split the *N*-point data sequence into two *N*/2-point data sequences  $f_1(n)$  and  $f_2(n)$ , corresponding to the even-numbered and odd-numbered samples of x(n), respectively, that is,

$$\begin{split} f_1(n) &= x(2n) \\ f_2(n) &= x(2n+1), \qquad n = 0, 1, \dots, \frac{N_2}{2} - 1 \end{split}$$

Thus  $f_1(n)$  and  $f_2(n)$  are obtained by reducing x(n) by a factor of 2, and hence the resulting FFT algorithm is called a *decimation-in-time algorithm*. Now the N-point DFT can be expressed in terms of the DFT's of the decimated sequences as follows:

$$\begin{aligned} X(k) &= \sum_{n=0}^{N-1} x(n) W_N^{kn}, \qquad k = 0, 1, \dots, N-1 \\ &= \sum_{n \text{ even}} x(n) W_N^{kn} + \sum_{n \text{ odd}} x(n) W_N^{kn} \\ &= \sum_{m=0}^{(N/2)-1} x(2m) W_N^{2mk} + \sum_{m=0}^{(N/2)-1} x(2m+1) W_N^{k(2m+1)} \end{aligned}$$

But  $W_{N^2} = W_{N/2}$ . With this substitution, the equation can be expressed as

where  $F_1(k)$  and  $F_2(k)$  are the *N*/2-point DFT's of the sequences  $f_1(m)$  and  $f_2(m)$ , respectively. Since  $F_1(k)$  and  $F_2(k)$  are periodic, with period *N*/2, we have  $F_1(k+N/2) = F_1(k)$  and  $F_2(k+N/2) = F_2(k)$ . In addition, the factor  $W_N^{k+N/2} = -W_N^k$ . Hence the equation may be expressed as

$$\begin{split} X(k) &= F_1(k) + W_N^k F_2(k) \,, \qquad k = 0, 1, \dots, \frac{N}{2} - 1 \\ X(k + \frac{N}{2}) &= F_1(k) - W_N^k F_2(k) \,, \qquad k = 0, 1, \dots, \frac{N}{2} - 1 \end{split}$$

that the direct The observation is of  $F_1(k)$  $(N/2)^2$  complex computation requires multiplications. The same applies to the computation of  $F_2(k)$ . Moreover there are N/2additional complex multiplications required. Hence the computation of X(k)requires almost  $2(N/2)^2 + N/2 = N^2/2 + N/2$  complex multiplications. This first step results in a reduction of the number of multiplications from  $N^2$  to  $N^2/2 + N/2$ , which is about a factor of 2 for N large.

The decimation of the data sequence can be repeated and again until the resulting sequences are reduced to one-point sequences. For the case  $N = 2^v$ , the decimation can be done  $v = \log_2 N$  times. Hence the number of complex multiplications required is  $(N/2)\log_2 N$ . The number of complex additions required is  $N\log_2 N$ .

The Architecture implemented is RSSDF Radix -2 Single path Delay Feedback. This architecture uses the registers more efficiently by storing the butterfly output in feedback shift registers. A single data goes through the multiplier at each stage. It has same number butterfly units and multipliers as in R2MDC approach, but with much reduced memory requirement: N -1 registers. Its memory requirement is minimal. FFT can be decomposed using a first half/second-half approach that divides the output sequence X(r) into increasingly smaller subsequence; this procedure is called decimation-in-frequency(DIF) FFT



Over all Block for N point FFT.



nternal Architecture of each stage of FFT

The butterfly starts storing the data into FIFO when valid in is high. Butterfly has an internal control logic which consists of three states. In first stage it receives real and imaginary data and stores it into FIFO till it is full. Once the FIFO is full it will go to next state. In this state it reads data from FIFO as well as receives input data, processes these two inputs and generates two outputs.

In second state the addition and subtraction of inputs will be performed, subtraction results of inputs are stored in FIFO and addition results are sent to next stage. Depending on stage after receiving entire input data state is incremented to next state i.e. third state. In this state the subtraction results stored in the FIFO in previous state are sent to next state. If valid in is high the received data is stored in the FIFO again.

The output data bits i.e. Data out (Real & Imaginary values) are connected to the inputs of the next stages and the valid out output signal acts as valid in input to the next stage. This is how all the 10 stages are connected to form a complete 1024 Point FFT. This Architecture is entirely scalable to tune to any size FFT. If the 1<sup>st</sup> stage is removed than it becomes a 512 point FFT. And if 2<sup>nd</sup> stage is removed it becomes a 256 point FFT. And in a similar manner if one more extra stage is added to 1024 point FFT it becomes a 1024 Point FFT.

CORDIC (for COordinate Rotation DIgital Computer), also known as Volder's algorithm, is a simple and efficient algorithm to calculate hyperbolic and trigonometric functions, typically converging with one digit (or bit) per iteration. CORDIC is an example of digit wise computing algorithms. CORDIC is commonly used when no hardware multiplier is available (e.g. in simple micro controllers and FPGA's), as the only operations it requires are addition, subtraction, bitshift and table look up. CORDIC is a member of shift-and-add algorithms.

It calculates the trigonometric functions of sine, cosine, magnitude and phase (arc tangent) to any desired precision. CORDIC is computed on the idea of "rotating" the phase of a complex number, by multiplying it by a succession of constant values. However, the multiplies can all be powers of 2, so in binary arithmetic they can be done using just shifts and adds; no actual multiplier is needed.There are two inputs real input and imaginary input. The inputs are stored in an array. The 8point FFT is generated using series of adder and multiplier. CORDIC is used in the multiplier section where we use stage level operation for the same.

#### FLOW CHART FOR THE METHODOLOGY



#### 4 Results and discussions

Simulation Result, valid in is kept high and realin and imagein inputs are given. Validout goes high and we get realdata out and imag data out.

FFT 1024 SIMULATIONS RESULTS: We have verified 8 point FFT, once this is verified we are sure that it will work for other points as well i.e. 16,32,64...1024. The reason being is that the basic building block of our architecture remains same for all stage.

In order to verify for 1024point , what we did was , we generated 2 frequencies i.e 50hz and 120hz, added up the two signals. This signal now acts as the input to 1024 Point FFT. The expected plot output for FFT is that it should be a Frequency domain view and we should get the peaks at 50 and 120hz on the graph. We have plotted both Matlab and Verilog codes output plot. And we can conclude that the output matches. There is a loss of precision and noise in Verilog output because we are using 16 point decimal format as compared to Matlab which uses 64 bit IEEE floating point format, hence this loss in accuracy. But the thing which is important is We are getting the peaks at the correct frequency confirming working of our code.



#### 5 Conclusion

The biggest concern in sing EMG is the noises generated which corrupts the electric signal generated by the muscles. The project explains an efficient FFT core for filtering the electric signals for Electromyography. This project introduces the concept of using CORDIC in generating the FFT for filtering the electric signals. Though the proposed system is efficient but has its own limitations. The current approach can be improved by using Radix 4 FFT as radix-4 completes one cycle for a given waveform hence performance is improved. Also the biggest concern with this approach is it is efficient for static signals but when we try to measure dynamic signals like running then the efficiency of Radix -2 is lesser compared to radix-4.As an enhancement we can use radix-4 FFT for creating the FFT core. The performance can be improved tremendously going for wavelet form for building the FFT signal. Another enhancement that can be added is passing a 3-D image instead of 2-D signal for calculating the EMG.

### 6 References

1. Aimei Tang, Li Yu, Fangjian Han, Zhiqiang Zhang (2016), 'CORDIC-based FFT Real-time Processing Design and FPGA Implementation', 2016 IEEE 12th International Colloquium on, pp. 233-236.

2. Adnan Ahmed Ali; Albarahany ; Liu quan, (2012), 'EMG signals detection technique in voluntary muscle movement ',6th International Conference on New Trends in Information Science, Service Science and Data Mining (ISSDM2012)

3. Ahemad Ali, C Kanagasabapathi, Siva S Yellampalli (2017),' Pipelined-Scalable FFT Core with optimized Custom Floating point engine for OFDM system', International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT).

4. Edwin Joseph, A. Rajagopal, K.karibasappa Edwin Joseph, A. Rajagopal, K.karibasappa, (2012), '. FPGA implementation of Radix-2 FFT processor based on Radix-4 CORDIC', Nirma University International Conference on Engineering (NUiCONE).

5. Nurhazimah Nazmi ;Mohd Azizi Abdul Rahman ; Saiful Amri Mazlan ; HairiZamzuri ; Makoto Mizukawa (2015), 'Electromyography (EMG) based signal analysis for physiological device application in lower limb rehabilitation', 2nd International Conference on Biomedical Engineering (ICoBE).

6. Puneet Bansal, B. S. Dhaliwal, S. S. Gill (2014), 'Memory-efficient Radix-2 FFT Processor using CORDIC Algorithm' (ICGCCEE)

7. S. Josue Saenz, Juan J. Raygoza P., Edwin C. Becerra A., Susana Ortega Cisneros, Jorge Rivera Dominguez, (), 'FPGA Design and Implementation of Radix-2 Fast Fourier Transform Algorithm with 16 and 32 Points, (2015) 'Power Electronics and Computing (ROPEC) IEEE International Autumn Meeting on, pp. 1-6.

8. Tarek Belabed ; Chokri Souani ; Sabeur Jemmali, (2014) ,'FFT implementation and optimization on FPGA',4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP).

# **IJSER**

• Anju MI is currently pursuing Doctorate and is a teaching staff for Applied Electronics India.

<sup>•</sup> Reshma.R is currently pursuing masters degree program Applied Electronics, India, PH-9611080715. E-mail: Reshmapanicker08@gmail.com